Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Detection of negative emotion burst topic in microblog text stream
LI Yanhong, ZHAO Hongwei, WANG Suge, LI Deyu
Journal of Computer Applications    2020, 40 (12): 3458-3464.   DOI: 10.11772/j.issn.1001-9081.2020060880
Abstract305)      PDF (1188KB)(400)       Save
How to find negative emotion burst topic in time from massive and noisy microblog text stream is essential for emergency response and handling of emergencies. However, the traditional burst topic detection methods often ignore the differences between negative emotion burst topic and non-negative emotion burst topic. Therefore, a Negative Emotion Burst Topic Detection (NE-BTD) algorithm for microblog text stream was proposed. Firstly, the accelerations of keyword pairs in microblog and the change rate of negative emotion intensity were used as the basis for judging the topics of negative emotion. Secondly, the speeds of burst word pairs were used to determine the window range of negative emotion burst topics. Finally, a Gibbs Sampling Dirichlet Multinomial Mixture model (GSDMM) clustering algorithm was used to obtain the topic structures of the negative emotion burst topics in the window. In the experiments, the proposed NE-BTD algorithm was compared with an existing Emotion-Based Method of Topic Detection (EBM-TD) algorithm. The results show that the NE-BTD algorithm was at least 20% higher in accuracy and recall than the EBM-TD algorithm, and it can detect negative emotion burst topic at least 40 minutes earlier.
Reference | Related Articles | Metrics
Fast unsupervised feature selection algorithm based on rough set theory
BAI Hexiang, WANG Jian, LI Deyu, CHEN Qian
Journal of Computer Applications    2015, 35 (8): 2355-2359.   DOI: 10.11772/j.issn.1001-9081.2015.08.2355
Abstract602)      PDF (773KB)(349)       Save

Focusing on the issue that feature selection for the usually encountered large scale data sets in the "big data" is too slow to meet the practical requirements, a fast feature selection algorithm for unsupervised massive data sets was proposed based on the incremental absolute reduction algorithm in traditional rough set theory. Firstly, the large scale data set was regarded as a random object sequence and the candidate reduct was set empty. Secondly, random object was one by one drawn from the large scale data set without replacement; next, each random drawn object was checked if it could be distinguished with the other objects in the current object set and then merged with current object set, if the new object could not be distinguished using the candidate reduct, a new attribute that can distinguish the new object should be added into the candidate reduct. Finally, if successive I objects were distinguishable using the candidate reduct, the candidate reduct was used as the reduct of the large scale data set. Experiments on five unsupervised large-scale data sets demonstrated that a reduct which can distinguish no less than 95% object pairs could be found within 1% time needed by the discernibility matrix based algorithm and incremental absolute reduction algorithm. In the experiment of the text topic mining, the topic found by the reducted data set was consistent with that of the original data set. The experimental results show that the proposed algorithm can obtain effective reducts for large scale data set in practical time.

Reference | Related Articles | Metrics
Kernel improvement of multi-label feature extraction method
LI Hua, LI Deyu, WANG Suge, ZHANG Jing
Journal of Computer Applications    2015, 35 (7): 1939-1944.   DOI: 10.11772/j.issn.1001-9081.2015.07.1939
Abstract519)      PDF (997KB)(495)       Save

Focusing on the issue that the label kernel functions do not take the correlation between labels into consideration in the multi-label feature extraction method, two construction methods of new label kernel functions were proposed. In the first method, the multi-label data were transformed into single-label data, and thus the correlation between labels could be characterized by the label set; then a new label kernel function was defined from the perspective of loss function of single-label data. In the second method, mutual information was used to characterize the correlation between labels, and a new label kernel function was proposed from the perspective of mutual information. Experiments on three real-life data sets using two multi-label classifiers demonstrated that the best method of all measures was feature extraction method with label kernel function based on loss function and the performance of five evaluation measures on average increased by 10%; especially on the data set Yeast, the evaluation measure Coverage reached a decline of about 30%. Closely followed by feature extraction method with label kernel function based on mutual information and the performance of five evaluation measures on average increased by 5%. The theoretical analysis and simulation results show that the feature extraction methods based on new output kernel functions can effectively extract features, simplify learning process of multi-label classifiers and, moreover, improve the performance of multi-label classification.

Reference | Related Articles | Metrics
Real-time detection framework for network intrusion based on data stream
LI Yanhong, LI Deyu, CUI Mengtian, LI Hua
Journal of Computer Applications    2015, 35 (2): 416-419.   DOI: 10.11772/j.issn.1001-9081.2015.02.0416
Abstract546)      PDF (792KB)(424)       Save

The access request for computer network has the characteristics of real-time and dynamic change. In order to detect network intrusion in real time and be adapted to the dynamic change of network access data, a real-time detection framework for network intrusion was proposed based on data stream. First of all, misuse detection model and anomaly detection model were combined. A knowledge base was established by the initial clustering which was made up of normal patterns and abnormal patterns. Secondly, the similarity between network access data and normal pattern and abnormal pattern was measured using the dissimilarity between data point and data cluster, and the legitimacy of network access data was determined. Finally, when network access data stream evolved, the knowledge base was updated by reclustering to reflect the state of network access. Experiments on intrusion detection dataset KDDCup99 show that, when initial clustering samples are 10000, clustering samples in buffer are 10000, adjustment coefficient is 0.9, the proposed framework achieves a recall rate of 91.92% and a false positive rate of 0.58%. It approaches the result of the traditional non-real-time detection model, but the whole process of learning and detection only scans network access data once. With the introduction of knowledge base update mechanism, the proposed framework is more advantageous in the real-time performance and adaptability of intrusion detection.

Reference | Related Articles | Metrics